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We study robustness properties of several procedures for joint estimation of shape and scale in a generalized 
Pareto model. The estimators we primarily focus on, MERE and OMSE, are one-step estimators distin- 
guished as optimally-robust in the shrinking neighborhood setting, i.e.; they minimize the maximal bias, 
respectively, on a specific such neighborhood, the maximal mean squared error For their initialization, we 
propose a particular Location-Dispersion estimator, MedkMAD, which matches the population median and 
kMAD (an asymmetric variant of the median of absolute deviations) against the empirical counteiparts. 

These optimally-robust estimators are compared to maximum likelihood, skipped maximum likehhood, 
Cramer- von-Mises minimum distance, method of median, and Pickands estimators. To quantify their devia- 
tion from robust optimality, for each of these suboptimal estimators, we determine the finite sample break- 
down point, the influence function, as well as the statistical accuracy measured by asymptotic bias, variance, 
and mean squared error — all evaluated uniformly on shrinking neighborhoods. These asymptotic findings 
are complemented by an extensive simulation study to assess the finite sample behavior of the considered 
procedures. Applicability of the procedures and their stability against outliers is illustrated at the Danish fire 
insurance data set from R package evir. 
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1. Introduction 

This paper deals with optimally-robust parameter estimation in generalized Pareto dis- 
tributions (GPDs). These arise naturally in many situations where one is interested in 
the behavior of extreme events as motivated by the Pickands-Balkema-deHaan extreme 
value theorem (PBHT), cf. Balkema and deHaan [2], Pickands [39]. The application we 
have in mind is calculation of the regulatory capital required by Basel II [1] for a bank 
to cover operational risk, see H., R. and Bae [24]. In this context, the tail behavior of the 
underlying distribution is crucial. This is where extreme value theory enters, suggesting 
to estimate these high quantiles parameterically using, e.g. GPDs, see Neslehova et al. 
[37] . Robust statistics in this context offers procedures bounding the influence of single 
observations, so provides reliable inference in the presence of moderate deviations from 
the distributional model assumptions, respectively from the mechanisms underlying the 
PBHT. 

Literature: Estimating the three-parameter GPD, i.e., with parameters for threshold, 
scale, and shape, has been a challenging problem for statisticians for long, with many 
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proposed approaches. In this context, estimation of the threshold is an important topic of 
its own but not covered by the framework used in this paper. Here we rather limit our- 
selves to joint estimation of scale and shape and assume the threshold to be known. In 
the meantime, for threshold estimation we refer to Beirlant et al. [3, 4], while robustifi- 
cations of this problem can be found in Dupuis [1 1], Dupuis and Victoria-Feser [14], and 
Vandewalle et al. [53]. 

We also do not discuss non-parametric or semiparametric approaches for modelling 
the tail events (absolute or relative excesses over the high threshold) only specifying the 
tail index a through the number of exceedances over a high threshold. The most popular 
estimator in this family is the Hill estimator [23]; for a survey on approaches of this 
kind, see Tsourti [51]. With their semi/non-parametric nature, these methods can take into 
account the fact that the GPD is only justified asymptotically by the PBHT and for finite 
samples is merely a proxy for the exceedances distribution. On the other hand, none of 
these estimators considers an unknown scale parameter directly, but define it depending 
on the shape, so these estimators do not fall into the framework studied in this paper. 

In parametric context, for estimation of scale and shape of a GPD, the maximum like- 
lihood estimator (MLE) is highly popular among practitioners, and has been studied in 
detail by Smith [50]. This popularity is largely justified for the ideal model by the (asymp- 
totic) results on its efficiency, see van der Vaart [52, Ch. 8], by which the MLE achieves 
highest accuracy in quite a general setup. 

The MLE looses this optimality however when passing over to only slightly distorted dis- 
tributions which calls for robust alternatives. To study the instability of the MLE, Cope 
et al. [8] consider skipping some extremal data peaks, with the rationale to reduce the in- 
fluence of extreme values. Grossly speaking, this amounts to using a Skipped Maximum 
Likelihood Estimator (SMLE), which enjoys some popularity among practitioners. Close 
to it, but bias-corrected, is the weighted likelihood method proposed in Dupuis and Mor- 
genthaler [12]. Dupuis [11] studies optimally bias-robust estimators (OBRE) as derived 
in [22, 2.4 Thm. 1], realized as M-estimators. 

Generalizing He and Fung [19] to the GPD case, Peng and Welsh [38] propose a method 
of medians estimator, which is based on solving the implicit equations matching the pop- 
ulation medians of the scores function to the data coordinatewise. 

Pickands estimator (PE) [39] matches certain empirical quantiles against the model ones 
and strikes out for its closed form representation. This idea has been generalized to the 
Elementary Percentile Method (EPM) by Castillo and Hadi [7]. 

Another line of research may be grouped into moments-based estimators, matching empir- 
ical (weighted, trimmed) moments of original or transformed observations against their 
model counterparts. For the first and second moments of the original observations this 
gives the Method of Moments (MOM), for the probability-transform scaled observations 
this leads to Probability Weighted Moments (PWM), see Hosking and Wallis [25]; a hy- 
brid method of these two is studied in Dupuis and Tsao [13]; with the likelihood scale, this 
gives Likelihood Moment Method (LME) as in Zhang [55]. Brazauskas and Kleefeld [5] 
cover trimmed moments. Clearly, except for the last one, all these methods are restricted 
to cases where the respective population moments are finite, which may preclude some 
of them for certain applications: for the operational risk data even first moments may not 
exist [37] so ordinary MOM estimators cannot be used in these cases. 
Examples of minimum distance type estimators like the Minimum Density Power Diver- 
gence Estimator (MDPDE) or the Maximum Goodness-of-Fit Estimator (MGF) can be 
found in Juarez and Schucany [28] and Luzeno [33], respectively. 

Considered estimators: Except for Dupuis [11], non of the mentioned robustifications 
heads for robust optimality. This is the topic of this paper. In the GPD setup, we study esti- 
mators distinguished as optimal, i.e., the maximum likelihood estimator (MLE), the most 
bias-robust estimator minimizing the maximal bias (MB RE), and the estimator minimiz- 
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ing the maximal MSE on gross error neighborhoods about the GPD model, when the ra- 
dius of contamination is known (OMSE) and not known (RMXE). These estimators need 
globally -robust initialization estimators; for this purpose we consider Pickands estima- 
tor (PE), the method-of-median estimator (MMed) and a particular Location-Dispersion 
(LD) estimator, MedkMAD. From our application of these estimators to operational risk, 
we take the skipped maximum likelihood estimator (SMLE) and the Cramer-von-Mises 
Minimum Distance estimator (MDE) as competitors. 

Contribution of this article: Our contribution is a translation of asymptotic optimality 
from Rieder [42] to the GPD context and derivation of the optimally-robust estimators 
MERE, OMSE, and RMXE in this context together with their equivariance properties in 
Proposition 3.3. This also comprises an actual implementation to determine the respective 
influence functions in R, including a considerable speed-up by interpolation with Algo- 
rithm 4.4. Moreover, for initiaUzation of MLE, MERE, OMSE, RMXE, we propose a 
computationally-efficient starting estimator with a high breakdown — the MedkMAD es- 
timator, which improves known initialization-free estimators considerably. For its distinc- 
tion from alternatives, common finite sample breakdown point notions to assess global 
robustness have to be replaced by the concept of expected finite sample breakdown point 
introduced in R.& H. [47]. While the optimality results of Rieder [42] do not quantify 
suboptimality of competitor estimators, our synopsis in Section 4.5 provides a detailed 
discussion of this issue. To this end, in Appendix A, in Propositions A. 1-A.6, we provide 
a variety of largely unpublished results on influence functions, asymptotic (co)variances, 
(maximal) biases, and breakdown points of the considered estimators. The optimality the- 
ory we use is confined to an asymptotic framework for sample size tending to infinity; 
the simulation results of Section 5 however close this gap by establishing finite sample 
optimality down to sample size 40. 

Structure of the paper: In Section 2 we define the ideal model and summarize its smooth- 
ness and invariance properties, and then extend this ideal setting defining contamination 
neighborhoods. Section 3 provides basic global and local robustness concepts and recalls 
the influence functions of optimally robust estimators; it also introduces several efficiency 
concepts. Section 4 introduces the considered estimators, discusses some computational 
and numerical aspects and in a synopsis summarizes the respective robustness properties. 
A simulation study in Section 5 checks for the validity of the asymptotic concepts at finite 
sample sizes. To illustrate the stability of the considered estimators at a real data set, in 
Section 6, we evaluate the estimators at the Danish fire insurance data set of R package 
evir [35] and at a modified version of it, containing 1.5% outliers. Our conclusions are 
presented in Section 7. Appendix A provides our calculations behind our results in the 
synopsis section. Proofs are provided in Appendix E. 



2. Model Setting 

2.1. Generalized Pareto Distribution 

The three -parameter generalized Pareto distribution (GPD) has c.d.f. and density 

where .x> /l for (§ > 0, andjU <.t < jU- | if (§ < 0. It is parametrized by t? = (^,j3,/l)^, 
for location /i, scale j8 > and shape <^. Special cases of GPDs are the uniform {t, = — 1), 
the exponential {t, =0, }X = 0), and Pareto ((§ > 0, j8 = 1) distributions. 
We limit ourselves to the case of known location jj. = here; for shape values oi t, >0, 
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GPD is a good candidate for modeling distributional tails exceeding threshold fJ, as moti- 
vated by the PBHT, but for simplicity we do not make this restriction in this paper; with 
this restriction, corresponding log-transformations as discussed later for scale j3 would 
also be helpful for shape ^ . For all graphics and both numerical evaluations and simula- 
tions, we use the reference parameter values j3 = 1 and <§ = 0.7. For known jU, the model 
is smooth in 6 = (i§,j8): 

Proposition 2.1: For given jj. and at any <§ G R, j8 > 0, the GPD model from (2.1) is 
L2-differentiable w.r.t. (j3 , i§ ), with L2-derivative (or scores) 

Ae(z)=(^log(l + ^z)-^T^;-i + ^Tf5.)\ z = '-f (2.2) 

and finite Fisher information J'q 

^^ = (2^+i)(^ + i)(r\r4 + i))^' ^'-'^ 

As J'q is positive definite for <§ € K, /3 > 0, the model is (locally) identifiable. 
In-/Equivaraince The model for given \i is scale invariant in the sense that for X a 
random variable (r.v.) with law =Sf (X) = f(^.fo.^), for j8 > also 5^{^X) = F(^ j,^^) is in 
the model. Using matrix dp = diag(l, j8), correspondingly, an estimator 5 for = (<^ , j3) 
is called ( scale )-equivariant if 

S{Pxi,...,Pxn) =dpS{xi,...,Xn) (2.4) 

However, no such in-/equivariance is evident for the shape component. 

Later on, it turns out useful to transform the scale parameter to logarithmic scale, 
because of breakdown of scale estimates, see Lemma 3.4 below, i.e.; to estimate j8 = 
logj3, fi = e^ and then, afterwards to back-transform the estimate to original scale by the 
exponential. For the transformed model, we write 

i8=logj8, = (^J), Ae(z) = ^log/e(z), J-,=^-,KiAl (2.5) 

On log-scale, scale equivariance (2.4) translates into a shift equivariance: an estimator 
5 for = ((§ ,^) is called ( shift )-equivariant if 

S{l5xu...,lixn)=S{e^xu...,e^Xn)=S{xu...,x„) + {0,py (2.6) 

Lemma 2.2 : For the scores these invariances are refiected by the relations 
Ae{x) = d^'Ae,{j), J^e = dp' J^e.d^', A^{x)=Af^^^{f), J-, = \ (2.7) 

where 

01 = (<§, 1) respectively % = ('§,0) (2.8) 

and 

A^{x)=dpAe{x) (2.9) 
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2.2. Deviations from the Ideal Model: Gross Error Model 

Instead of working only with ideal distributions, robust statistics considers suitable dis- 
tributional neighborhoods about this ideal model. In this paper, we limit ourselves to the 
Gross Error Model, i.e. our neighborhoods are the sets of all real distributions f ° repre- 
sentable as 

F'-'= = (l-£)f'd + £fd' (2.10) 

for some given size or radius e > 0, where f "^ is the underlying ideal distribution and f^' 
some arbitrary, unknown, and uncontrollable contaminating/distorting distribution which 
may vary from observation to observation. For fixed £ > 0, bias and variance of robust 
estimators usually scale at different rates (0(e), 0{l/n), respectively). Hence to balance 
bias and variance scales, in the shrinking neighborhood approach, see Huber-Carol [27], 
Rieder [42, 43], and Bickel [6], one lets the radius of these neighborhoods shrink with 
growing sample size n, i.e. 

e = r„ = r/y^ (2.11) 

In reality one rarely knows £ or r, but for situations where this radius is not exactly 
known, in Rieder et al. [44] we provide a criterion to choose a radius then; this is detailed 
in Section 3.3. Our reference radius for our evaluations and simulations is r = 0.5. 



3. Robust Statistics 

To assess robustness of the considered estimator against these deviations, we study local 
properties measuring the infinitesimal influence of a single observation as the influence 
fiinction (IF) and global ones like the breakdown point measuring the effect of massive 
deviations. 



3.1. Local Robustness: Influence Function andALEs 

For 8x the Dirac measure at x and Fg = [I — £)F + edx, Hampel [21] defines the influence 
function of a statistical functional T at distribution F and in x as 

IF(..;r,f) = lim^^^^i^^^ (3.1) 

e^O £ 

provided the limit exists. Kohl et al. [31, (introduction)] summarize some pitfalls of 
this definition, which in our context however can be avoided: by the Delta method, this 
amounts to the question of Hadamard differentiability of the likelihood (MLE, SMLE), of 
quantiles (PE, MMed, MedkMAD), and of the c.d.f. (MDE). Indeed, results from Fern- 
holz [15], Rieder [42, Ch. 1,6] establish that all our estimators are ALEs in the following 
sense. 

ALEs Asymptotically linear estimators or AL£s in our GPD model are estimators 5„ for 
parameter B , having the expansion in the observations X,- as 

Sn = e + -j^\ife{Xi)+R„, v^|7?„r-^0 P^-stoch. (3.2) 
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for Ye £ ^2(^9) ^^^ If^ of ^" fo'' which we require 

Ee Ye = 0, Ee WeK = h (3.3) 

(with I2 the 2-dim. unit matrix and L\{Pq) the set of all 2-dim. r.v.'s X s.t. / \X\'^ dPQ < °°). 
Note that for (3.3) we need L2 -differentiability as shown in Proposition 2.1. 
Using (2.9) one easily sees that if i^g is an IF in the model with original scale, 

T7e(;c):=J^VeW (3.4) 

is an IF in the log scale model, so there is a one-to-one correspondence between the IFs 
in these models. 

In the sequel we fix the true parameter value 6 and suppress the respective subscript 
where unambiguous. The class of all y/ G ^2(^) satisfying (3.3) is denoted by ^'i- In the 
class of ALEs asymptotic variance and the maximal asymptotic bias may be expressed in 
terms of the respective IF only, as recalled in the following proposition. 

Proposition 3.1 : Let ^,1 be a sequence of shrinking neighborhoods in the gross er- 
ror model (2.10), (2.11) with starting radius r. Consider an ALE 5,, with IF \j/- The (n- 
standardized) asymptotic (co)variance matrix ofSn on ^j is given by 

asVaiiSn) = f VV^ dF (3.5) 

The ^/n- standardized, maximal asymptotic bias asBias(5„) on ^„ is r ■ GES(i/a) where 

GES{\i/) := sup,\\i/{x)\ (3.6) 

is the gross error sensitivity and \ ■ \is the Euclidean norm. The (maximal, n- standardized) 
asymptotic mean squared error (MSE) asMSE(5'„) on ^ is given by 

asMSE(5„) = r2GES2 + tr(asVar(5„)) (3.7) 

For a proof of this proposition we refer to Rieder [42, Rem. 4.2.17(b), Lem. 5.3.3]; for 
the notion "gross error sensitivity" see Hampel et al. [22, Ch. 2.1c]. 
Optimally-robust ALEs By Proposition 3.1 we may delegate optimizing robustness to 
the class of IFs; the optimally-robust IFs are determined in the following proposition due 
to [42, Thm.'s 5.5.7 and 5.5.1]. 

Proposition 3.2: In our GPD model enlarged by (2.10), (2.11), the unique ALE mini- 
mizing asBias, denoted by MB RE, is given by its IF y where Y i^ necessarily of form 

\j/ = bY/\Y\, Y=AA-a, Z; = max{tr(A)/E|y|} , (3.8) 

a A 

and the unique ALE minimizing asMSE on a (shrinking) neighborhood of radius r, de- 
noted by OMSE is given by its IF iff where Y i^ necessarily of form 

Y = Ymin{l, b/\Y\}, Y=AA-a, r'^b = E{\Y\-b)+ . (3.9) 

In both cases A G IR , a G R , b > are Lagrange multipliers ensuring that Y ^ ^2- 

Invariance Lemma 2.2 entails an invariance of the optimally-robust IFs, which allows 
a reduction to reference scale di respectively do from (2.8) and alleviates computation 
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considerably — provided in the original (j8-)scale model, we replace Euclidean norm n^ 
by 



nii{x):=\di^'x\ = ^xj+xl/l5^ (3.10) 

In particular, by correspondence (3.4) the optimal solutions in original scale and in log- 
scale coincide. 

Proposition 3.3 : 

(a) Replacing Euclidean norm by np in Proposition 3.2, the optimal IPs are as in (3.8) and 
(3.9), where one has to replace expression tr(A) by tr(dg A) in (3.8). 

(b) In the original scale model, with norm nn,for xj/ = V ^^ V — V' 

\l/e{x)=dp\l/e,{x/l3) (3.11) 

and the Lagrange multipliers translate according to 

AQ=dpAe^dp, ae=d^aQ^, be = be^ (3.12) 

(c) In the log-scale model with the Euclidean norm, the Lagrange multipliers remain in- 
variant under parameter changes and writing r\ for the optimal IPs, 

Vei^) = ne,i^/P) (3-13) 

(d) The optimally-robust IPs with their Lagrange multipliers A, a, and b in the log-scale 
model from (c) are related to the ones in the original scale from (b) by 

T]g(.jc)=j^ VeW, A = dj^^Aedp\ a = dp^ae, b = be (3.14) 

In a subsequent construction step, one has to find an ALE achieving the optimal IF. 
For this purpose, we use the one-step construction, i.e.; to a suitable starting estimator 
0„(°) = 0i°)(Xi, . . . ,X„) and IF Ye, we define 

5„ = 0r + -^w,o)(X,) (3.15) 

For exact conditions on On see Rieder [42, Ch. 6] or Kohl [29, Sec. 2.3]. Suitable start- 
ing estimators allow to interchange supremum and integration, and asMSE also is the 
standardized asymptotic maximal MSE. 



3.2. Global Robustness: Breakdown Point 

The breakdown point in the gross error model (2.10) gives the largest radius £ at which 
the estimator still produces reliable results. We take the definitions from Hampel et al. 
[22, 2.2 Definitions 1,2]. The asymptotic breakdown point (ABP) e* of the sequence of 
estimators r„ for parameter G at probability P is given by 

e*: = sup|£G(0,l] 3compact^eC0: 7r(F,G)<£ ^ G({r„G/«:e})"4°°l|, (3.16) 
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where n is Prokhorov distance. The finite sample breakdown point (FSBP) e* of the esti- 
mator T„ at the sample {xi,...,x„) is given by 

e*{Tn;xi,...,Xn):= -maxim; max sup |r„(zi, ...,z„)| < ool (3.17) 

n ^ 'U-,'myi,...,y,„ J 

where the sample (zi, ...,z„) is obtained by replacing the data points xt^ , ...,JC;„, by arbitrary 
values yi,...,ym. Definition (3.17) however does not cover implosion breakdown of scale 
parameter. Passage to the log-scale as in (2.5) provides an easy remedy though, compare 
He[18],i.e.; 



e*(r„;.xi,...,.x„) := -max j/n; max sup |log(r„(zi,...,z„))| < ool. (3.1 

n L (i,...,/„,yj_..._3,„, J 



8) 



Expected finite sample breakdown point For deciding upon which procedure to take 
before having made observations, in particular for ranking procedures in a simulation 
study, the FSBP from (3.17) has some drawbacks: for some of the considered estimators, 
the dependence on possibly highly improbable configurations of the sample entails that 
not even a non-trivial lower bound for the FSBP exists. To get rid of this dependence to 
some extent at least, but still preserving the finite sample aspect, we use the supplementary 
notion of expected FSBP (EFSBP) proposed and discussed in detail in R.& H. [47], i.e.; 

£:(?;) :=E£;(r„;Xi,...,X„) (3.19) 

where expectation is evaluated in the ideal model. We also consider the limit e*{T) := 
lim„_)>ooe,*(r„) and also call it EFSBP where unambiguous. 

Inheritance of the breakdown point If the only possible parameter values where break- 
down occurs are at infinity, it is evident from equation (3.15) that for bounded IF, a one- 
step estimator inherits the breakdown properties of the starting value 0„ . This is not 
true for scale parameter j3. If scale component j3„ > of the starting estimate 0,, is 
small, it can easily happen that the scale component of the one-step construction fails to 
be positive, entailing an implosion breakdown. Lemma 3.4 below shows that we avoid 
this, if, in the one-step construction, we pass to log-scale as in (2.5) (and afterwards back- 
transform); in the lemma, we write i//2(jc; 0) for the scale component of IF \ife{x) (in the 
untransformed model) evaluated at observation x and parameter B. 

Lemma 3.4 : Consider construction (3. 15) with starting estimator 5), = ( j3„ , (§„ Y- If 

scale part j3« > and j/sup^ \\f/2{x',Sn ) | =b <°°, for scale part j3„ of one-step estimator 
Sn back-transformed from log-scale, we obtain 

A, = PP exp (-L- ^ x,f2{Xr,sl^^)) > (3.20) 

and the breakdown point of ^n is equal to the one of 15,1 . 



3.3. Efficiency 

To judge the accuracy of an ALE S = 5,, it is natural to compare it to the best achievable 
accuracy, giving its (asymptotic relative) efficiency eff.id (in the ideal model) defined as 

,ff -Hrv^ - tr(asVar(MLE))) _ tr(j^-l) 

^ ^^ tr(asVar(5)) tr(asVar(5)) ^ ' 
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In terms of sample size n, (asymptotically) the optimal estimator, i.e., the MLE in our 
case, needs « • (1 — eff .id(5')) less observations to achieve the same accuracy as S. 

Preserving this sample size interpretation, we extend this efficiency notion to situations 
under contamination of known radius r (or realistic conditions) eff.re, defined again as a 
ratio w.r.t. the optimal procedure, i.e.. 



...e<.).e..,.,..2^|i«) 



Finally, in Rieder et al. [44], for the situation where radius r is (at least partially) unknown, 
we also compute the least favorable efficiency eff.ru 

eff .ru(5) := mineff .re(5; r) (3.23) 



where r ranges in a set of possible radius values (here r G [0,oo)). The radius ro maximiz- 
ing eff.ru is called least favorable radius. In our reference setting, i.e., for ^ = 0.7 and 
j3 = 1, we obtain ro = 0.486 which is in fact very close to our chosen reference radius of 
0.5. 

The procedure we recommend in this setting is the OMSE to r = ro, called radius 
maximin estimator (RMXE); it achieves maximin efficiency eff.re. 

Remark 3.5 It is common in robust statistics to use high breakdown point estimators 
improved in a reweighting step and tuned to achieve a high efficiency eff .id, usually to 
95%. This practice to determine the degree of robustness is called Anscombe criterion 
and has its flaws, as the "insurance premium" paid in terms of the 5% efficiency loss does 
not reflect the protection "bought", as this protection will vary model-, and in our non- 
invariant case even 0-wise. Instead, we recommend criteria eff.re and eff.ru to determine 
the degree of robustness. 

Illustrating this point, in the GPD model at (§ = 0.7, tuning the OBRE for eff .id = 95%, 
where we indicate this tuning by a respective index for OBRE, we obtain 

efl[.id(OBRE95%) = 95%, but efr.m(OBRE95%) = 14%, 
while efr.id(OMSE^=o.5) = off .ru(OMSEr=o.5) = 67.8% 
and eflt.id(RMXE) = efF.ru(RMXE) = 68.3%, 

These 14% indicate an unduely high vulnerability of OBRE95% w.r.t. bias. For plots 
of the curve r \-^ eff.re(5';r) we refer to Rieder et al. [44, p.26] (up to using reciprocal 
values for relative efficiencies); as shown there, the curve is bowl-shaped, decreasing for 
r — > 0,oo; OBRE95% takes its minimum for r = oo, while for RMXE both local minima, 
i.e., at r = and r = oo are equal. 



4. Estimators 

In this section we gather the definitions of the estimators considered in this paper; all 
of them are scale-invariant (respectively shift-invariant passing to the log-scale); their 
robustness properties are detailed in Appendix A and summarized in Subsection 4.5. 
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10 

4.1. Optimal Estimators 

MLE The maximum likelihood estimator is the maximizer (in 0) of the (product-log-) 
UkeUhood /„ ( ; Xi , . . . , X„ ) of our model 

/„(0;Xi,...,X,O = £/e(X,), /e(x)=log/e(^) (4.1) 

!=1 

For the GPD, this maximizer has no closed-form solutions and has to be determined nu- 
merically, using a suitable initialization; in our simulation study, we use the Hybr estima- 
tor defined below. 

Next, we discuss the optimally-robust estimators. By Proposition 3.3 all of them achieve 
scale-invariance respectively shift-invariance passing to the log-scale as in (2.5), and all 
of them use a one-step construction (3.15) with Hybr as starting estimator. 
MB RE Minimizing the maximal bias on convex contamination neighborhoods, we obtain 
the MERE estimator, see Proposition 3.2; in the terminology of Hampel et al. [22] this 
is the most B-robust estimator. In most references though, e.g. Dupuis [11], one uses 
M-equations instead of one-step constructions to achieve IF \ff from Proposition 3.2. At 
(§ = 0.7 and j8 = 1, we obtain the following Lagrange multipliers A,a,b 

Ambre = ( _o'i§' 22 ) ' ^f^BRE = (-0.18,0.00), Z^mbre = 3.67 (4.2) 

^MBRE is unique while Ambre and Ombre are only unique up to a scalar factor, which in 
our context is fixed setting A ij = 1. 

OMSE For OMSE we proceed similarly as for MERE, i.e., we determine y/ according to 
Proposition 3.2. At <§ =0.7 and j8 = 1, we obtain the unique Lagrange multipliers 

(1 rv nz: n on \ 
-289' 387)' «OMSE = (-1.08,0.12), feoMSE=4.40 (4.3) 

Remark 4.1 OMSE also solves the "Lemma 5 problem" with its own GES as bias bound, 
compare [42, Thm. 5.5.7], i.e., among all ALEs minimizes the (trace of the) asymptotic 
variance subject to this bias bound on neighborhood '^„. Hence OMSE is a particular 
OERE in the terminology of Hampel et al. [22], spelt out for the GPD case in Dupuis [11] 
(but again using M equations instead of a one-step construction). She does not head for 
the MSE-optimal bias bound, so our OMSE will in general be better than her OERE w.r.t. 
MSE at radius r. On the other hand, for given a bias bound b, equations (3.9) also yield 
a radius r{b) for which a given OERE is MSE-optimal. In this sense, bias bound b and 
radius r are equivalent parametrizations of degree of robustness required for the solution. 

RMXE As mentioned, the RMXE is obtained by maximizing eff.ru among all ALEs 5„. 
Ey R. and Rieder [48, Thm. 6.1], we have 

eff.ru(5„) = min (eff.id(5„),GES2(MBRE)/GES2(5„)) (4.4) 

and the RMXE is the OERE with GES b equalling both terms in the min-expression in 
(4.4). In our model dXE, = 0.7 and j8 = 1, we obtain the unique Lagrange multipliers 



1 A 02 2 87 

iRMXE = 1 287' 385'' '^RMXE = (-1-03,0.12), ^RMXE=4.44 (4.5) 
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Remark 4.2 Passing from MSE to another risk does not in general invalidate our optimal- 
ity, compare R. and Rieder [48, Thm. 3.1]. Whenever the asymptotic risk is representable 
as G(tr asVar, | asBias |) for some function G isotone in both arguments, the optimal IF 
is again in the class of OBRE estimators — with possibly another bias weight. In addition, 
the RMXE for MSE is simultaneously optimal for all homogenous risks of this form with 
continuous G (Thm. 6. 1 loc.cit.). In particular, for one-dimensional parameter, this covers 
all risks of type E\S„ — d \p for any p e [1 , °o). 



4.2. Starting Estimators 

Initializations for the estimators discussed so far are provided by the next group of esti- 
mators (PE, MMed, MedkMAD, Hybr). They can all be shown to fulfill the requirements 
given in Rieder [42, Ch. 6], in particular they are uniformly ^/«-tight on our shrinking 
neighborhoods. Corresponding proofs are available upon request. 

PE Estimators based on the empirical quantiles of GPD are described in the Elementary 
Percentile Method (EPM) by Castillo and Hadi [7]. Pickands' estimator (PE), a special 
case of EPM, is based on the empirical 50% and 75% quantiles Q2 and Q^ respectively, 
and has first been proposed by Pickands [39]. The construction behind PE is not limited 
to 50% and 75% quantiles. More specifically, let a > 1 and consider the empirical OJ,- 
quantiles for ai = I — l/a and a2 = 1 — V'^^ denoted by Qr>{a), Qsia), respectively. 



Then PE is obtained for a = 2, and as theoretical quantiles we obtain Q2{a) = t(« ~ !)> 



23(a) = j{a^^ ~ 1)' ^nd the (generalized) PE denoted by PE(a) for £, and j3 is 



£ = ^wMlhM) A = I . Q2i"f (4 6) 

^ log«i"g Q,(a) ' P ^ Q,{a)-2Q2{a) ^^■^> 

MMed The method of medians estimator of Peng and Welsh [38] consists of fitting the 
(population) medians of the two coordinates of the score function Aq against the corre- 
sponding sample medians of Ae, i.e.; we have to solve the system of equations 

median (X,)/j8 = m^ , for m^ := F-^{l/2) = (2^ - l)/^ (4.7) 

median(log(l + ^X,/i3)/3-2_(l + ^)X,(i3^+^2^,)"') =M{^) (4-8) 

where M((^) is the population median of the (^-coordinate of A0j(X) withX ~ GPD(0i). 
Solving the first equation for j3 and plugging in the corresponding expression into the 
second equation, we obtain a one-dimensional root-finding problem to be solved, e.g. in 
R by uniroot. 

MedkMAD Instead of matching empirical moments against their model counterparts, 
an alternative is to match corresponding location and dispersion measures; this gives 
Location-Dispersion estimators, introduced by Marazzi and Ruffieux [34]. While a natural 
candidate for the location part is given by the median, for the dispersion measure, promis- 
ing candidates are given by the median of absolute deviations MAD and the alternatives 
Qn and Sn introduced in Rousseeuw and Croux [45], producing estimators MedMAD, 
MedQn, and MedSn, respectively. All these pairs are well known for their high break- 
down point in location-scale models, jointly attaining the highest possible ABP of 50% 
among all affine equivariant estimators at symmetric, continuous univariate distributions. 
For results on MedQn and MedSn, see R.& H. [47]. These results justify our restriction 
to Med(k)MAD for the GPD model in this paper. 

Due to the considerable skewness to the right of the GPD, MedMAD can be improved 
by using a dispersion measure that takes this skewness into account. For a distribution F 
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on R with median m let us define for ^ > 

kMA'D{F,k):=mf{t>0\F{m + kt)-F{m-t)> 1/2} (4.9) 

where k in our case is chosen to be a suitable number larger than 1, and k = I would 
reproduce the MAD. Within the class of intervals about the median m with covering 
probability 50%, we only search those where the part right to m is k times longer than 
the one left to m. Whenever F is continuous, kMAD preserves the FSBP of the MAD 
of 50%. The corresponding estimator for <§ and j8 is called MedkMAD and consists of 
two estimating equations. The first equation is for the median of the GPD, which is 
m = m((§,j8) = j8(2^ — !)/(§. The second equation is for the respective kMAD, which 
has to be solved numerically as unique root M of fm.t.B;k{^) for 

fm,^M^) = l/2 + v,„,M,^,/3W -v„,,M,^,/3(-l) (4-10) 

where Vm.^,^ (5) := {l + ^{sM+m)/p)-^/^. 

Hybr Still, Table 3 here and Table 9 of R.& H. [46] show failure rates of 8% for n = 40 
and 2.3% for n = 100 to solve the MedkMAD equations for k = 10. To lower these rates 
we propose a hybrid estimator Hybr, that by default returns MedkMAD for ^ = 10, and by 
failure tries several fc- values in a loop (at most 20) returning the first estimator not failing. 
We start at k = 3.23 (producing maximal ABP), and at each iteration multiply k by 3. 
This leads to failure rates of 2.3% for n = 40 and 0.0% for n = 100. Asymptotically, Hybr 
coincides with MedkMAD, k = 10. 

4.3. Competitor Estimators 

The following estimators were suggested to us in an application to operational risk, see 
R.&H. [46]. 

SMLE Skipped Maximum Likelihood Estimators (SMLE) are ordinary MLEs, skip- 
ping the largest k observations. This has to be distinguished from the better investigated 
trimmed/weighted MLE, studied by Field and Smith [16], Hadi and Luceiio [17], Vandev 
and Neykov [54], Miiller and Neykov [36], where trimming/weighting is done according 
to the size (in absolute value) of the log-likelihood. 

In general these concepts fall apart as they refer to different orderings; in our situation 
they coincide due to the monotonicity of the likelihood in the observations. 

As this skipping is not done symmetrically, it induces a non-vanishing bias B„ = B„ q 
already present in the ideal model. To cope with such biases three strategies can be used — 
the first two already considered in detail in Dupuis and Morgenthaler [12, Section 2.2]: (1) 
correcting the criterion function for the skipped summands, (2) correcting the estimator 
for bias B„, and (3) no bias correction at all, but, conformal to our shrinking neighborhood 
setting, to let the skipping proportion a shrink at the same rate. Strategy (3) reflects the 
common practice where a is often chosen small, and the bias correction is omitted. In the 
sequel, we only study Strategy (3) with a = a„ = r' / ^/n for some r' larger than the actual 
r. This way indeed bias becomes asymptotically negligible: 

Lemma 4.3 : In our ideal GPD model, the bias B,^ of SMLE with skipping rate OCn is 
bounded from above by ca„log(«)/or some c <°°, eventually in n. 
If for some (^ G (0, 1], liminf„ a„n^ > 0, then for some c > also 

Uminf„?i^B„ > climinf„n^a„log(«). 
IfO < a = Uminf„ a„ < CCofor Oq = exp(— 3 — l/<§ ), then for some c' > 
liminf„B„ > c'a(-log(a)). 

It can be shown along the lines of Rieder [42, Thm. 1.6.6] that after subtracting bias B„, 
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SMLE is indeed an ALE. 

MDE General minimum distance estimators (MDEs) are defined as minimizers of a suit- 
able distance between the theoretical F and empirical distribution f„. Optimization of this 
distance in general has to be done numerically and, as for MLE and SMLE, depends on a 
suitable initialization (here again: Hybr). We use Cramer-von-Mises distance defined for 
c.d.f.'s F, G and some a-finite measure v on B* as 

dc.M{F,Gf = j{F{x)-G{x)fv{dx) (4.11) 

i.e.; MD¥j = aigvahiQ dcvM{FtuFe) ■ In this paper we use v =Fq. Another common setting 
in the Uterature uses the empirical, V = F„. As shown in Rieder [42, Ex. 4.2.15, Sec 6.3.2], 
CvM-MDE belongs to the class of ALEs. 



4.4. Computational and Numerical Aspects 

For computations, we use R packages of R Development Core Team [40], and addon- 
packages ROptEst, Kohl and R. [32] and POT, Ribatet [41], available on the Comprehen- 
sive R Archive Network CRAN, cran.r-project.org. 

Computation of Lagrange multipliers A, a, and b of the optimally-robust IFs from 
Proposition 3.2 (at the starting estimate) are not available in closed form expressions, 
but corresponding algorithms to determine them for each of MERE, OMSE, and RMXE 
are implemented in R within package ROptEst [32] available on CRAN. Although these 
algorithms cover general Li-differentiable models, particular extensions are needed for 
the computation of the expectations under the heavy-tailed GPD. 

Speed-up by interpolation Due to the lack of invariance in t,, solving for equations (3.8) 
and (3.9) can be quite slow: for any starting estimate the solution has to be computed 
anew. Of course, we can reduce the problem by one dimension due to Proposition 3.3, 
i.e.; we only would need to know the influence functions for "all" values (§ > 0. To speed 
up computation, we therefore have used the following approximative approach, already 
realized in M. Kohl's R package RobLox [30] for the Gaussian one-dimensional location 
and scale modeP. In our context, the speed gain obtainable by Algorithm 4.4 is by a 
factor of ~ 125, and for larger n can be increased by yet another factor 10 if we skip the 
re-centering/standardization and instead return Y'^w^. 

Algorithm 4.4 For a grid (§i , . . . , (§m of values of E, , giving parameter values 0, i = ((§/, 1) 
(and for OMSE to given r = 0.5), we offline determine the optimal IF's ^fg. ^ , solving 
equations (3.8) and (3.9) for each 0,,i and store the respective Lagrange multipliers A, a, 

and b, denoted by A,-, a,, Z?,. In the evaluation of the ALE for given starting estimate 0„ , 
we use Proposition 3.3 and pass over to parameter value d' = ((§„ ,1). For d', we find 
values A^ a^, and b^ by interpolation for the stored grid values A/, a,, Z?,-. This gives us 
7^ =A^Ke,-a^, and w^ = min (l,Z7V|J''^|))- So far, 7^^ ^»F2(e'), i.e., does not satisfy 
(3.3) at d'. Thus, similarly to Rieder [42, Rem. 5.5.2], we define Y^ = A^Ae' - a^ for 
J=AhK z* = Ee/[Ae'W^]/Ee'[w^], A« = {^e,[{Ae, - z^){he, - z^yw^]y\ and pass 
over to \\f^ = Y'^w^. By construction 1//^^ G m2{B'). 



' Due to the affine equivariance of MERE, OBRE, OMSE in the location and scale setting, interpolation in package RobLox 
is done only for varying radius r. 
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4.5. Synopsis of the Theoretical Properties 

Breakdown, bias, variance, and efficiencies: In Table 1, we summarize our findings, 
evaluating criteria FSBP (where exact values are available), asBias = rGES, tr asVar, 
and asMSE (at r = 0.5). To be able to compare the results for different sample sizes n, 
these figures are standardized by sample size n, respectively by -^/n for the bias. We also 
determine efficiencies eff .id, eff .re, and eff.ru. For FSBP of MLE, SMLE, we evaluate 
terms at n = 1000, where for SMLE we set r' = 0.7 entailing a„ = 2.2%. Finally, we 
document the ranges of least favorable jc- values x^, at which the considered IFs attain 
their GES. These are the most vulnerable points of the respectively estimators infinites- 
imally, as contamination therein will render bias maximal. In all situations where jci f is 
unbounded, a value lO''' will suffice to produce maximal bias in the displayed accuracy. 
On the other hand, PE and MMed are most harmfully contaminated by smallish values of 
about x= 1.5 (for j3 = l). 

The results for SMLE are to be read with care: asBias and asMSE do not account for 
the bias B„ already present in the ideal model, but only for the extra bias induced by con- 
tamination. Lemma 4.3 entails that B„ is of exact unstandardized order 0{log{n)/^), 
hence, asBias and asMSE should both be infinite, and efficiencies in ideal and contami- 
nated situation be 0. For n = 1000, asBias and asMSE are finite: according to Lemma 4.3, 
VlOOOSiooo ^ 5.38, while the entry of 3.75 in Table 1 is just GES. 

As noted, MLE achieves smallest asVar, hence is best in the ideal model, but at the 
price of a minimal FSBP and an infinite GES, so at any sample one large observation size 
suffices to render MSE arbitrarily large. 

MedkMAD gives very convincing results in both asMSE and (E)FSBP. It qualifies as 
a starting estimator, as it uses univariate root-finders with parameter-independent search 
intervals. The best breakdown behavior so far has been achieved by Hybr, with £* R^ 1/3 
for a reasonable range of (§ -values. MDE shares an excellent reUabiUty with Hybr, but 
contrary to the former needs a reliable starting value for the optimization. 

MERE, OMSE, and RMXE have bounded IFs and are constructed as one-step estima- 
tors, so by Lemma 3.4 inherit the FSBP of the starting estimator (Hybr), while at the same 
time MERE achieves lowest GES (unstandardized by n of order 0.1 atn = 1000), OMSE 
is best according to asMSE, and RMXE is best as to eff.ru. RMXE (which is the OMSE 
for r = 0.486) and OMSE for r = 0.5, with their radii almost coinciding, are virtually 
indistinguishable, guaranteeing an efficiency of 68% over all radii. 

We admit that MDE, MedkMAD/Hybr, and MERE are close competitors in both effi- 
ciency and FSEP, both at given radius r = 0.5 and as to their least favorable efficiencies, 
never dropping considerably below 0.5. All other estimators are less convincing. 



estimator 


asBias 


tr asVar 


asMSE 


elf. id 


ctt.re 


etf.ru 


■i-'i.f. 


^1000 


MLE 


oo 


6.29 


oo 


1.00 


0.00 


0.00 


oo 


0.00 


MERE 


1.84 


13.44 


16.80 


0.47 


0.84 


0.47 


[0.00; oo) 


0.35* 


OMSE 


2.20 


9.29 


14.13 


0.68 


1.00 


0.68 


[0.00;0.07 U 5.92;oo) 


0.35* 


RMXE 


2 22 


9.21 


14.14 


0.68 


1.00 


0.68 


[0.00;0.07 U 5.92;oo) 


0.35* 


PE 


4.08 


24.24 


40.87 


0.26 


0.35 


0.20 


[0.89; 2.34] 


0.06 


MMed 


2.62 


17.45 


24.32 


0.36 


0.58 


0.32 


[0.00;0.34]U[0.90;2.54] 


0.25' 


MedkMAD 


2.19 


12.80 


17.60 


0.49 


0.80 


0.49 


[0.54;0.89]U[4.42;oo) 


0.31 


SMLE 


3.75 


7.03 


21.08 


0.90 


0.67 


0.03 


[20.67;oo) 


0.02 


MDE 


2.45 


9.76 


15.74 


0.64 


0.90 


0.56 


{0,"} 


0.35' 



Table 1. Comparison of the asymptotic robustness properties of the estimators 
*: inherited from starting estimator Hybr; : conjectured. 



Influence functions: In Figure 1, we display the IFs \j/q of the considered estimators. The 
IF of RMXE visually coincides with the one of OMSE. All IFs are scale invariant so that 

Ve{x)=dii\i/eiix/P). 
Intuitively, based on optimality within L2{Fe), to achieve high efficiency, the IF should 
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MDE CvM 



0.10 1 .OO 10.0 100 

MBRE 



:i^r:7^ 



10.0 loo 



OM5E/RMXE 





MetJkMAD, k^10 



shape ^ 



scale p 



Figure 1. Influence Functions of MLE, SMLE (with « 0.7 ■ v^ slcipped value), MDE CvM, MBRE, OMSE, PE, 
MMed, MedkMAD estimators of the generalized Pareto distribution; mind the logarithmic scale of the 
X-axis. 

be as close as possible in L2-sense to the respective optimal one. So on first glance, Med- 
kMAD achieves an astonishingly reasonable efficiency in the contaminated situation, al- 
though its IF looks quite different from the optimal one of OMSE; but, of course, this 
difference occurs predominantly in regions of low fg -probability. 
Values £, ^ 0.7: The behavior for our reference value ^ = 0.7 is typical. The conclusions 
we just have drawn as to obtainable efficiencies and the ranking of the procedures largely 
remain valid for other parameter values, as visible in Figure 2. The least favorable radii 
for (§ G [0,2] all range in [0.39,0.51]. Note that due to the scale invariance we do not need 
to consider j8 7^ 1. From this figure we may in particular see the minimal value for the 
efficiencies as extracted in Table 2. 



estimator 


MLE 


PE 


MMed 


MedkMAD 


SMLE 


MDE 


MBRE 


OMSE 


RMXE 


min^ cfF.id 


1.00 


0.16 


0.07 


0.40 


0.00 


0.45 


0.41 


0.58 


0.63 


min^ off. re 


0.00 


0.24 


0.12 


0.78 


0.00 


0.69 


0.78 


1.00 


0.98 


min^ cff.ru 


0.00 


0.15 


0.07 


0.40 


0.00 


0.43 


0.41 


0.58 


0.63 



Table 2. Minimal efficiencies for ^ varying in [0, 2] in the ideal model and for contamination of known and unknown 
radius 



5. Simulation Study 

5.1. Setup 

For sample size n = 40, we simulate data from both the ideal GPD with parameter values 
/I = 0, <§ = 0.7, j8 = 1. Additional tables and plots for n = 100, 1000 can be found in R.& 
H. [46]. We evaluate the estimators from the previous section at M = 10000 runs in the 
respective situation (ideal/contaminated). 
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Efficiencies 



ideal situation 



MLE 

PE 

MMed 

MedkMAD 

SMLE 

MDE 

MBRE 

OMSE 

RMXE 




cont. situation, radius r=0.5 Itnown 



cont. situation, radius unicnown 





Figure 2. Efficiencies for varying shape of MLE, SMLE (with fs 0.7 ■ v^" skipped value), CvM-MDE, MBRE, 
OMSE, PE, MMed, MedkMAD estimators for scale P = 1 and vaiying shape ^ . 

The contaminated data stems from the (shrinking) Gross Error Model (2.10), (2.11) 
with r = 0.5. For n = 40, this amounts an actual contamination rate of r4o = 7.9%. 

In contrast to other approaches, for realistic comparisons we allow for estimator-specific 
contamination, such that each estimator has to prove its usefulness in its individual worst 
contamination situation. This is particularly important for estimators with redescending 
IF like PE and MMed, where drastically large observations will not be the worst situation 
to produce bias. As contaminating data distribution, we use G„,/ = Dirac(10^''), except 
for estimators PE and MMed, where we use G^ ^ = unif (1.42, 1.59) in accordance with 
xu. from Table 1 . 



5.2. Results 

Results are summarized in Table 3. Values for Bias, tr Var, and MSE (standardized by 
\/40 and 40, respectively) all come with corresponding CLT-based 95%-confidence inter- 
vals. Column "NA" gives the failure rate in the computation in percent; basically, these 
are failures of MMed or MedkMAD/Hybr to find a zero, which due to the use of Hybr 
as initialization are then propagated to MLE, SMLE, MDE, MBRE, OMSE, and RMXE. 
Column "time" gives the aggregated computation time in seconds on a recent dual core 
processor for the 10000 evaluations of the estimator for ideal and contaminated situation. 
For MLE, SMLE, MDE, MBRE, OMSE, and RMXE we do not include the time for eval- 
uating the starting estimator (Hybr) but only mention the values for the evaluations given 
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ideal situation: 












estimator 


|Bias 




trVar 




MSE 




cff.id 


rank 


NA 


time 


MLE 


0.55 


±0.05 


7.41 


±0.21 


7.72 


±0.21 


1.00 


1 


0.53 


113 


MBRE 


0.61 


±0.08 


18.62 


±1.56 


19.00 


±1.59 


0.41 


7 


0.53 


402 


OMSE 


0.25 


±0.06 


9.02 


±0.22 


9.08 


±0.21 


0.85 


2 


0.53 


783 


RMXE 


0.21 


±0.06 


9.27 


±0.33 


9.31 


±0.32 


0.83 


3 


0.53 


769 


PE 


0.85 


±0.27 


19.30 


±1.54 


20.01 


±1.67 


0.39 


8 


0.00 


13 


MMed 


8.91 


±1.98 


1.02c5 ±2423.14 


1.02e5 ±2458.24 


0.00 


10 


10.50 


168 


MedkMAD 


0.47 


±0.07 


11.55 


±0.30 


11.78 


±0.29 


0.66 


5 


8.15 


197 


Hybr 


0.71 


±0.07 


11.96 


±0.31 


12.46 


±0.30 


0.62 


6 


0.53 


223 


SMLE 


4.70 


±0.06 


9.49 


±0.30 


31.62 


±0.47 


0.24 


9 


0.53 


75 


MDE 


0.40 


±0.06 


10.56 


±0.27 


10.72 


±0.25 


0.72 


4 


0.53 


384 









contaminatec 


situation: 










estimator 


|Biaf 




trVar 


MSE 




cff.rc 


rank 


NA 


MLE 


394.12 


±22.92 


1.37e7 


±1.20e6 


1.52e7 


±1.37e6 


0.00 


10 


0.53 


MBRE 


1.70 


±0.09 


20.49 


±1.36 


23.37 


±1.39 


0.85 


4 


0.37 


OMSE 


2.62 


±0.07 


13.11 


±0.42 


19.98 


±0.60 


0.99 


2 


0.37 


RMXE 


2.73 


±0.07 


12.34 


±0.39 


19.80 


±0.57 


1.00 


1 


0.37 


PE 


2.32 


±0.49 


62.25 


±67.90 


67.64 


±69.35 


0.30 


7 


0.00 


MMed 


5.13 


±1.17 


3563.54 


±1442.56 


3589.87 


±1454.42 


0.01 


8 


4.25 


MedkMAD 


2.32 


±0.09 


18.82 


±0.49 


24.21 


±0.67 


0.82 


6 


2.15 


Hybr 


2.23 


±0.09 


19.23 


±0.50 


24.21 


±0.67 


0.82 


5 


0.02 


SMLE 


7.44 


±3.10 


2.51 c5 


±1.52o5 


2.52c5 


±1.52o5 


0.00 


9 


0.53 


MDE 


2.64 


±0.08 


16.19 


±0,43 


23.15 


±0.59 


0.86 


3 


0.53 



Table 3. Comparison of the empirical robustness properties of the estimators at sample size n = 40 and with log- 
transformation (2.5) used for the scale component; numbers in small print indicate CLT-based 95% confi- 
dence intervals for the empiiical values. 

the respective starting estimate. The respective best estimator is printed in bold face. 

The simulation study confirms our findings of Section 4.5; entries in Table 3 follow the 
same pattern as the ones of Table 1 . This holds in particular for the ideal situation, and 
for the efficiencies, where in the latter case Table 1 provides reasonable approximations 
already for n = 100 [46, Tables 8,9]. 

The ranking given by asymptotics is essentially valid already at sample size 40 — as 
predicted by asymptotic theory, RMXE and OMSE in their interpolated and IF-corrected 
variant \\/^ at significance 95% are the best considered estimator as to MSE, although 
MDE, MBRE, and Hybr come close as to eff.re. 

By using Hybr as starting estimator the number of failures can be kept low: already at 
n = 40, it is less than 1% in the ideal model and about 3% under contamination. This is 
not true for MMed and MedkMAD, which suffer from up to 33% failure rate at this n 
under contamination. So Hybr is a real improvement. 

The results for sample size 40 are illustrated in boxplots in Figures 3(a) and 3(b), re- 
spectively. In Figure 3(a), the underestimation of shape parameter t, by SMLE in the ideal 
situation stands out; all other estimators in the ideal model are almost bias-free, while PE 
is somewhat less precise; under contamination (Figure 3(b)), all estimators are affected, 
producing bias, most prominently in coordinate t,. As expected, this effect is most pro- 
nounced for MLE which is completely driven away, while the other estimators, at least in 
their medians stay near the true parameter value. 



6. Application to Danish Insurance Data 



In Figure 4 we illustrate the considered estimators evaluating them at the Danish fire 
insurance data set from R package evir [35]. This data set comprises 2167 large fire 
insurance claims in Denmark from 1980 to 1990 collected at Copenhagen Reinsurance, 
supplied by M. Rytgaard of Copenhagen Re and adjusted for inflation and expressed in 
millions of Danish crowns (MDKK). For illustration purposes, we have chosen a thresh- 
old of 1.88 MDKK, leaving us « = 1000 tail events. The values of estimates for shape 
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(b) 7.9 % contamination (corresponds to r = 0.5), sample size n = 40 

Figure 3. Boxplots for MLE, PE, MMed, MedkMAD, Hybr, SMLE (with !^ 0.7 ■ ^40 skipped values), MDE, MBRE, 
OMSE estimators for shape ^ and scale ji of the GPD at ideal (above) and contaminated data (below), (a), 
(b); number of runs: 10000; the red dashed hne is the true parameter value. 

and scale parameters are plotted together with asymptotic 95% (CLT-based) confidence 
intervals, denoted with filled points and solid arrows respectively. To visualize stability of 
the estimators against outliers at this data set, for radius r = 0.5, we artificially modify 
the original data set to a contaminated one with ry/n, or, after rounding, 15 outliers with 
lO^^MDKK, i.e.; an outlier rate of 1.5%. The respective estimates on the contaminated 
data set are plotted with empty circles and confidence intervals with dashed arrows. For 
the contaminated data, the confidence intervals are constructed to be bias-aware, i.e., with 
VasMSE instead of vasVax as scale. From Figure 4 we can conclude, that as expected, 
MLE is very sensitive to these 15 outliers, and that SMLE apparently tends to underesti- 
mate the shape parameter. The OMSE, RMXE, and MDE produce reliable values not only 
for the original Danish data set, but also for the contaminated one. MBRE and, worse, PE 
have a somewhat larger range of variation, and MMed and MedkMAD (which coincides 
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with Hybr here) for scale are quite well, but worse than the OMSE, RMXE, and MDE for 
shape. Note that outliers at lO^'^MDRK are not least favorable for PE and MMed. 



Estimators for p and ^ with 95%-CLT Confidence Intervals 



o ' ~^ ~^ ■"■■■ 



MBRE OMSE RMXE 



MMed MedkMAD SMLE 



o 



l:::$::$:2]:jJ-j-:$: 



MLE MBRE OMSE RMXE PE MMed MedkMAD SMLE MDE 



Figure 4. Confidence plots for MLE, MBRE, OMSE, RMXE, PE, MMed, MedkMAD/Hybr, SMLE (with ^ 0.7 ■ 
\/1000 skipped values), MDE estimators for shape ^ and scale /3 of the GPD at ideal and contaminated 
data (soHd/dashed arrows). Confidence range for ^ for MLE under contamination exceeds plotted region. 
Data: Danish insurance data set from R-package evlr [35], threshold: 1.88MDKK, sample size 1000, 
contamination: 15 data points modified to lO'^MDKK. 



7. Conclusion 



We have derived optimally robust estimators MBRE, OMSE, and RMXE for scale and 
shape parameters ^ and j3 of the GPD on ideal and contaminated data. Their computation 
has largely been accelerated by interpolation techniques. 

Among the potential starting estimators, clearly MedkMAD in its variant Hybr excels 
and comes closest to the aforementioned group. For the same purpose, PE is also robust, 
but not really advisably due to its low breakdown point and non-convincing efficiencies; 
the only reason for using PE is its ease of computation, which should not be so decisive. 
Even worse is the popular SMLE without bias correction, which does provide some, but 
much too little protection against outliers. 

Asymptotic theory and empirical simulations show that Hybr, MedkMAD, MDE, 
MBRE, OMSE, and RMXE estimators can withstand relatively high outlier rates as ex- 
pressed by an (E)FSBP of roughly 1/3 (compare R.& H. [46, 47]). SMLE in the variant 
without bias correction as used in this paper, but with shrinking skipping rate, and MLE 
have minimal FSBP of l/n, hence should be avoided. 

High failure rates for MMed and MedkMAD for small n, and under contamination limit 
their usability considerably, while Hybr works reliably. 

Looking at the influence functions, we see that, except for MLE, all estimators have 
bounded IFs, so finite GES, but do differ in how they use the information contained in an 
observation. 

This is reflected in asymptotic values, as well as in (simulated) finite sample values: 
for known radius we can recommend OMSE with Hybr as initialization. It has best sta- 
tistical properties in the simulations, is computationally fast, efficient for contamination 
of known radius. MBRE, and MDE come close to OMSE. For unknown radius RMXE 
is recommendable with again OMSE, MBRE, Hybr and MDE (in this order) as close 
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competitors. 
All estimators are publicly available in R on CRAN. 



Appendix A. Estimators 



For each of the estimators discussed in Section 4, we determine its IF, its asymptotic variance 
asVar, its maximal asymptotic bias asBias, and its FSBP where possible. All estimators considered 
in this appendix ai-e defined in the original (j3-)scale and equivariant in the sense of (2.4). 



A.l. Estimators Obtained as Minima or Maxima 
Proposition A.l (MLE) 

IF IFe(x;MLE,Fg) = ^^ ^e{x). where, using the quantile-type representation (Bl) 

"^^"^"l^V ^log(v)-(2^2^3^ + l)v^ + (3^ + l) ) ^^^^ 

MLE attains the smallest asymptotic variance among all ALEs. 
asVar 



asVar(MLE) = ^e " = ( 1 + ^ ) ( _^ ' 2^3 2 j (A2) 

asBias Both components of the joint IF are unbounded — although only growing in absolute value 

at rate log(x). 
FSBP The FSBP of MLE is minimal, i.e.; l/n. 

As we have seen, SMLE in fact does not estimate 9 but d{0) = +B0, for bias Bg already 
present in the ideal model. 

Proposition A.2 (SMLE) 

IF The functional T{Fg) :— SMLE{Fq) +B0 estimating d{9) may be written as 

T{F) = -^C "^ Ae{F-\s))ds (A3) 

1 — a Jo 

With Ua := F^' (1 — oc), its IF is given by 

I T^[-^e{ua) -W{F)\, X > Ua 
W{F) = {l-a)TiF) + aAe{ua) (A5) 

asVar Numeric values can be obtained by integrating out IF9 (z; T, Fg ). 

asBias For shrinking rate a„ = r^ j \fn, asymptotic bias of SMLE is finite for each n, but, standard- 
ized by y/n, is of exact order log(n), hence unbounded. The bias induced by contamination 
is dominated by B„ q eventually in n. 

FSBP FSBP= a„ eventually in n. 

Proposition A.3 (MDE) 

IF For vfrom (Bl), the IF of MDE is given by 

W{xMDE,Fe) = 3{^+3)' (^M' -^n (^J)m^W))' Z^'" 

~ \ / 19+5'g , U.2]nr.(v\ 1 2-';,,2 1 ,.2+£\ 
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Vl,l,Vl.2 



for 
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(A7) 



125(5+2^)(5 + ^f V ^1.2' ^2.2 

Vi,i = 81 ('l6<^^ + 272<^'*+1694<^^ + 4853<^2 + 7276<^ +6245^2^ +9)"^ 

Vi,2 = -9j3 (4<^V86<^V 648.^^ + 2623^ +4535) (2<^ +9)"', 

V2,2 = i8^(26<^^+601<^2_^3154^+5255) 

asBias asBias(MDE) is finite. 

FSBP The FSBP ofMDE is at least 1 /2 of the optimal FSBP achievable in this context. An upper 
bound is given by 



e„ < min < 



- infi, ^ V. 



sup,, j <P. 



sup,, s (p.— inf,, c (f>_ ' sup,,£ ^.— infj.E ip. 



^,/3} 



(A8) 



To make the inequality in (AS) an equality, we would need to show that we cannot produce a 
breakdown with less than this bound. Evaluating bound (A8) numerically gives a value of 4/9 + 
36%, which is achieved for v = (and (^ ^- 0) or, equivalently, letting the m replacing observations 
in Definition (3.17) tend to infinity. To see how realistic this value is compare Figure Al, where 
we produce an empirical max-bias-curve by simulations. 



FSBP for MDE 




Figure Al. Empirical Bias for FSBP of MDE to CvM distance 

This bias is computed by simulating M = 100 samples of size n = 1000 from a GPD with ^ = 0.7, j3 = 1, 
and after replacing m observations, for m = 1,.. . ,400 by value lO'". There is a steep increase around 
354, so we conjecture that (E)FSBP should be approximately 0.35. 



A.2. Starting Estimators 



Proposition A.4 (PE) 
IF 



IF.(x;PE(fl),Fe)=^. ^2,3 ''■•'■(«) 



ai{a)-Hx<Qi{a)) 



^,P 



(A9) 



with deterministic (signed) weights hj(a) to given in the proof. 
asVar Abbreviating ai{a) by a;, 1 — «,■ by cCi, and h,i{a) by h,i, the asymptotic covariance for 
PE(a) is 



a.V„(PE,„„.^'(;;jJ«;.^)( - 



-1-2^ 



-l-«^-« 



a2a 



-i-'5--'5 



a 



3% «3«3 



a, 

-1-25 



/'5.2, h^^3 



J^I3,2' H,'i 



(AlO) 



asBias asBias(PE) is finite. 
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FSBP e* = min{ l/fl^A^'7«}, /or < :=#{Xi\2Q2{a) <Xi < Qiia)}. 
e* = e*(fl) = min{;r^(fl), l/fl2}/or ;r^ (a) == {2a^ - l)"'/^ - 1/ 



a\ 



For ^ = 0.7, the classical PE achieves an ABP of e*{a == 2) = 6.42%; as to EFSBP, for n = 
40, 100, 1000 we obtain e* = 5.26%, 6.34%, 6.42%, respectively [47, Table 2]. 

Proposition A.5 (MMed) 

IF Let M(^) := A-Med{Fg^) = median (A9j;2 ° ^ej fh^ population median of the shape 

scores, U := -j^AQ^-2{qi), and m = m^ := Fg the population median. Then the level 

set {x G IR I Aei;2W < M{^)} is of form [qi{^),q2i^)] and IF(x;MMcd,Fe) = 
£)(IF(x;median,Fe),IF(x;A-Med,Fe))^ w/iere 

IF(x;median,Fe) - (i - D(.. < m)) //(m), IF(x;A-Med,Fe) = f^i^lf/f'}l^,\f/,, 

(All) 
and D is a corresponding deterministic Jacobian. 
asVar Let 

D-^ =EeXeAl for X8{x)^dpxe,{f), X6^x)= (i{x<m^)-l/2J{qi<x<q2)-l/2 

(A12) 
Then 

asVar(MMed) = \d (^^ -4Fiqi), ^ "T^^'^ ) ^' ^^^^^ 

asBias asBias(MMEd) is finite. 

We have not found analytic breakdown point values, neither for ABP nor for FSBP. While 50% 
by scale equivariance is an upper bound, the high frequency of failures in the simulation study for 
small sample sizes however indicates that (E)FSBP should be considerably smaller; a similar study 
for the empirical maxBias as the one for MDE gives that for sample size n from a rate of outliers of 
e„ on, we have but failures in solving for MMed, for £40 = 42.5%, £100 = 35.0%, £1000 = 25.1%, 
and £10000 = 20.1%. So we conjecture that the asymptotic breakdown point e* < 20%. 

Proposition A.6 (MedkMAD) 

IF Let G = G((^, j3); {M,m)) be the defining equations of MedkMAD, i.e.; 



G((^,j3);(M,m)) = (G^",G(^')'=(/m,^,/3;.W,j3^-mj (A14) 

and 

^=^\3l^J)) d{M.m) (A15) 

Then the IF of MedkMAD estimator is IF (x; MedkMAD, Fg) = 
D (IF (x; kMAD , Fg ) , IF (x; median, Fg)y where the IF ofkMAD is given by 



(-M<x-m<kM) , f(m+kM)-f(m-M) l(.t<m)-4 



ir^X,KiVlAU,rej - f{m+kM)-f{m-M) ^ kf(m+kM)+f{m-M) f{m) ^Aio; 

asVar Let as :~ f{m — M) +sf{m + kM), andd = fll[ +4(1 —a\)a^\f{m) and 

(Tu = (4/(m)) , a2.2 = j;;^j^y (Ti,2 = d,,! = 4/(„.K (^^^^ 



Then 



asVar(MedkMAD) = D^ ( ^'-i' ^'-^ ) D (A18) 



asBias asBias(McdkMAD) is finite. 
FSBP £* =min{A^;,,A^;;'}/« for 

N',=#{Xi\m <Xi<{k^\)m}, A^^ = [n/2] -#{X, | (l-i)m < X; < (fc^^ + l)m} 

(A19) 
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and 

-e*=mm(^Fgi{k+l)m) - i, Fgi{h+l)m) - Fei{l-k)m) - k) (A20) 

For £, = 0.7, the EFSBP is given by the first alternative if ^ < 3.23 and by the second one 
otherwise. 

As to the choice of k, it turns out that a value of k = 10 gives reasonable values of ABP, asVar, 
asBias for a wide range of parameters ^, see R.& H. [46]. In the sequel this will be our reference 
value for k; as to EFSBP, for « = 40, 100, 1000 and <^ € R we obtain e* = 42.53%, 43.86%, 44.75%, 
respectively [47, Table 2]. Results on optimizing MedkMAD in k w.r.t. the different robustness 
criteria for £, = 0.7 can be looked up in R.& H. [46, Table 5]. 



Appendix B. Proofs 

To assess integrals in the GPD model the following lemma is helpful, the proof of which follows 
easily by noting that v(z) introduced in it is just the quantile transformation of GPD(0, ^,1) up to 
the flip V i-^. 1 — V. 



Lemma B.l: LetX ^ GPD(^,^,j3) and let z = z{x) = (.y — /i)//3 and 



v = v(z) = (l+^z)-i/« (Bl) 



Then for U - unif(0, 1), we obtain ^{v{U)) = GPD(0,<^, 1) and ^{^v{U) + ^) = ^(X). 



Proof of Proposition 2.1: We start by differentiating the log-densities fg pointwise in x w.rt. 
^ and j3 to obtain (2.2) and, using Lemma B.l we obtain the expressions for (2.3), from where 
we see finiteness and positive definiteness. As density fg is differentiable in 9 and the corre- 
sponding Fisher information is finite and continuous in 0, by Hajek [20, App. A], this entails 
L2 -differentiability. D 

Proof of Lemma 2.2: For the first half of (2.7) let h = {h^,hp) and /i' = d„^li. We note that 
/e(x) = /e, (x/j3)/j3, hence fg+,,{x) = /e,+,'(x/j3)/j3. Then 



J {flil,Ay)'fl!'iy)i^ + 3^9. {y)h')ydy = o(|/zf ) = o{m 



,2 

dx^ 



So indeed, the L2-derivativeA0(,Y) is given by (f^'Ae^ {x/fi). Equation (2.9) is a consequence of the 
chain rule. This also entails the second half of (2.7): Ag(x) = daAg{x) = Ag^ (x//3) = Ag (x/j3). 
The assertions for ^g, J^g are simple consequences. D 
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Proof of Proposition 3.3: 

(a) Paralleling the proofs to Rieder [42, Thm.'s 5.5.7, 5.5.1, and Lem. 5.5.10], we see that the as- 
sertions of the theorems are also valid for general norms derived from quadratic forms; the only 
place leading to visible modification of the result is determining clipping height b of y/. In the 
proof of Thm. 5.5.1, the expression corresponding to tiA arises as E \f/'^do"Y = tvdg 'EYxif'^ = 

trdp^A. 

(b) With the definitions of Ag, ag, bg from (3.12), we obtain 

Ye{x) =AeAeix)-ag=diiAe,dpdi^^Ae^{j)-dpag^ =dpYe^{j) 

so in particular up {Yg {x)) =ni {Yg^ (x/j3 ) ) . For (3 . 1 1 ), we hence only have to check that, starting 
with the optimal IF \j/g^ € ^liOi), function xj/^^^x) := dp\i/g^{x/li) € ^2(6) and solves (3.8) 
respectively (3.9). By Lemma 2.2 and with X' ^ GPD(ei ) and X = pX', we get 

Ee v/fO'CX) =dpEgWe,{^)^ dp Eg, xj/g^ {X') = 

Eg v/(°)(X)A^(X) = dp Eg v/e, {f)Al if)dp' = dp Eg^ ^9, (X')A^, (X')^^ ' = D2 

To see that bg = bg^ , for (3.8) we see that with A' = dpAdp and a' = dpa 

ti-dp^A trA' 

bg ~ max y — ^ = max - 

Am 



Egnp(AAg{X)-a^ ^'■"' Ee«i (a'Aq, (|) -a') 



trA' 
= max J ^ = bg^ 

^''"'Eg^ni[A'Ag^iX')-a') 
while for (3.9) this follows from 

rX =Eg^{,n{Yg^iX'))-bg^'^^ = Eg(,liiYg^if))-bg^'^^=Eg(,lp{YgiX))-bg^ 

(c) Similarly as in (b), denoting by 'l'2 the set of IFs in the log-transformed model, we have to check 
that starting from the optimal IF 77g e'l'2(0o) function t] '"^ (x) :=rjg i^/P) ^'^lid) and solves 
(3.8) respectively (3.9); but by Lemma 2.2, this follows by analogue arguments as in (b). 

(d) Again we have to show that for optimally -robust IF i/Ag € 'P(0) function T]'"' :=d^^\(/g e^{9) 
and solves (3.8) respectively (3.9) in the log-scale model; but by (2.9), this is shown like in (b). 
D 

Proof of Lemma 3.4: Using the notation of the lemma, we set j8„ ;= logj3„, j3„ := logj8„ , and 
define 51°' := (^i"', j3i°'). Then to given IF if/ by the chain rule and (2.9), 77 (x; 0) := dp^\t/{x; 9) 

becomes an IF in the log-scale model. By construction (3. 15), j3„ = j3„ + -'^iri2iXf,S„ ), so 

A, = Al^'exp (^l^UXrJi:'^)) = ^f^'exp ('-l^^v^^C^,;^^)) 

So j3„ > whenever f}„ is. In particular, if sup^. | V/2 (x; 5„ )| = b < °°, the exp-term remains in 
[exp{—b),exp{b)], and hence breakdown (including implosion breakdown) can occur iff break- 
down has occurred in j3„ . D 

Proof of Lemma 4.3: We first note that (Xq < xo, the positive zero of x i-^. log(l — x) +x +x^ 
(i.e., Xo = 0.6837). By the asymptotic linearity of MLE, if we use a suitable (uniformly integrable) 
initialization, the bias of SMLE has the asymptotic representation 

B„ = «^(E(SMLE)-0)=((i| f Ev/^(y(,,„))|2 + (i| f E v/p(y(,:„))|Vi3') '^' (B2) 
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for X(^i^-„), y(i^:„) the respective kth order statistic. Using (Al), we see that for v € (0, 1), the com- 
ponents of the IF of MLE may each be written as fllog(v) +/(v), a ^Q, and / bounded on this 
range. Hence the dominating term is log(v). 

As the order statistics Vfi^-„\ are Beta-distributed, we thus have to consider |Elog(B|(.„)| for 
Bk.n ~ Bcta,{k,n —^+1), k ~ 1,..., [o:„«]. To this end, note that by the power series expan- 
sion of log(l —a), for any L > and any x G (0,1], — log(x) > Lf=i(l ^^Y/l, while for 
< X < xo, log(l — x) > —x — x^. As 1 —Bi^„ ^ Bcta(« — ^+ 1,A:), we further observe for n> k 
that E(l — Bin)' = n/=i (" + y ^ ^)/(" + y)' ^nd that for any decreasing suitably integrable func- 
tion f{x) with (indefinite) integral F{x), Y!]=\_f{j) < Iof{x)dx = F(n) — F(0). Hence, using 
\~x< e^^ for X e IR we obtain 

L i 1 / 



£,„ := |Elog(B,,„)| > ^E(l -B,J/l > ^ 1 f] ^ = li, iexp(I^.^ilog(l - ^)) > 
1=1 1=1 ' j=i 

>i:|exp(-i:-i. + ^)>L|exp(-fclog(^)-^) = 

1=1 7=1 ^ ' l=i 

t 7(1 - 7r^)'^M-T;Sk) > I 1(1 - ;;^)^exp(-^) > log(L)(l - -^)^exp(- (^) 



1=1 



Plugging in L = [^], we obtain, eventually in n, Ej^n > — log(a„)exp(— 1 — a„). On the other 
hand, for j8i.„ the densitiy of Beta(l,«), we split the integration range into [0, 1/n] and [l/«, 1] and 
obtain 

< / -\og{x)Pi „{x)dx < «(log(«) + l)/« + log(«) < 31og(«) 
Jo 

if n > 2. Now, for some constants di,d2 > independent of k and «, 

I E xjf^ (B,,„ ) I = ^-^E,,„ +di- ^^^ , lEfp {Bk,„ ) I = ^£,,„ + ^2 - (3 - I ) 

Hence, as ^ ,2% < 3 + ^ ^ ', for liminf a„ < Oq we obtain, eventually in n 



< '^+^^^'|+'''+^'' a„(-log(a„/ao))exp(-l-a„) < 
<- I ^V((<^ + l)2 + j3-2)(£,,„-3-l/^)2< 

" k=l ^ 

, 1 [«""! 1 r«n«l V 1/2 

< ({- E Ev/^(B,,„)}2 + {- ^ Ev/^(B,,„)}V^2\ ^5^^ 



and liminfB„ > if liminfa,, > 0, respectively Uminfn^B,, > c«^a„ log(«) if liminfn^ a„ > 0. On 
the other hand, eventually in n (as the other summand terms of ij/ are bounded in «) 



B„ <4 ^2 «„log(n) 



n 



Proofs of the Propositions in the Appendix 
Proof of Proposition A.l (MLE): 

IF The IF of MLE in our context has already been obtained in various references, see e.g. 
Smith [50]; as usual, we have IFg(x;MLE,fg) = .J^g^Ag{x). We have recalled the exact 
terms in (Al) for later reference. Regularity conditions, e.g. van der Vaart [52, Thm. 5.39], 
can easily be checked due to the smoothness of the scores function and entail that MLE 
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attains the smallest asymptotic variance among all ALEs according to the Asymptotic 

Minimax Theorem, Rieder [42, Thm. 3.3.8]. 
asVar Again, the asymptotic co variance of MLE for its use in the Cramer Rao bound has already 

been spelt out in other places, see e.g. [50]. 
asBias As (=/g" )i.i,(J^0"')2,i 7^ 0, both components of the joint IF are unbounded; the growth 

rate follows from (Al). 
FSBP The assertion on FSBP follows easily by letting one observation tend to °°. Admittedly, 

for an actual finite sample, one only can approximate this breakdown with extremely large 

contaminations. D 

Proof of Proposition A.2 (SMLE): 

IF In fact, we follow the derivation of IPs to L-estimators in Huber [26, Ch. 3.3]. Up to 
bias B„ we are interested in the a-trimmed mean of the scores, to which corresponds the 
functional given in (A3). Using the underlying order statistics of the X,-, we obtain (A4) 
and (A5) as in the cited reference. 

asVar As Bg is not random, the assertion is evident. 

asBias The assertion on the size of the bias follows from Lemma 4.3. As the IF is bounded 
locally uniform in 9, indeed the extra bias induced by contamination is dominated by B„ 
eventually in n. 

FSBP In our shrinking setting the proportion of the skipped data tends to 0, so it is the proportion 
which delivers the active bound for the breakdown point: just replace [a„n] + 1 observa- 
tions by something sufficiently large and argue as for the MLE to show that FSBP=a„. 
D 

Proof of Proposition A.3 (MDE): 

IF We follow Rieder [42, Example 4.2.15, Thm. 6.3.8] and obtain lF{x;M^E,Fe) =: 
fQ^^{(pf{x),^p{x)) with ^ as in the proposition and ^g the CvM Fisher information as 



defined, e.g. in Rieder [42, Definition 2.3.11], i.e.; 



18(g+3) _3^ 

3J3, 2j32 



asVar The asymptotic covariance of the CvM minimum distance estimators can be found analyt- 
ically or numerically. Our analytic terms are cross-checked against numeric evaluations; 
MAPLE scripts are available upon request for the interested reader 

asBias The fact that the IF is bounded follows e.g. from Rieder [42, Example 4.2. 15, 4.2 eq.(55), 
Thm. 6.3.8, Rem 6.3.9(a)]. 

FSBP Due to the lack of invariance in the GPD situation, Donoho and Liu [10, Propositions 4.1 
and 6.4] only provide lower bounds for the FSBR which is 1/2 the FSBP of the FSBP- 
optimal procedure among all Fisher consistent estimators. 

As MDE is a minimum of the smooth CvM distance, it has to fulfill the first order 
condition for the corresponding M-equation, i.e.; for V, = (1 + 4X;)^'''', 

£.<p^(y,;^) = o, £^^(y,;^) = o 

Arguing as for the breakdown point of an M-estimator, except for the optimization in £, , 
we obtain (A8) as an analogue to Huber [26, Ch. 3, eqs. (2.39) and (2.40)]. 

In our shrinking setting the proportion of the skipped data tends to 0, so it is the propor- 
tion which delivers the active bound for the breakdown point: just replace [«„«] + 1 obser- 
vations by something sufficiently large and argue as for the MLE to show that FSBP=a„. 
D 

Proof of Proposition A.4 (PE): 

IF The IF of linear combinations Ti of the quantile functionals F^' (a,) = Ti{F) for proba- 
bilities a, and weights /1,, / = l,...,k may be taken from Rieder [42, Ch. 1.5] and gives 

lFix;TL,Fe) = ^Jlj In (a, - D(x < F"' (a,)))//(^" '(«,)) 
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Using the A-method, the IFs of PE(a) hence is 

with weights h.j{a) which for Qi = Qi{a), i = 2,3 are given by 

1 G3 



h^2ifl) 



log(fl) (22(23-22)' 



, .^ , , , (62)^ , 1 262(23-22) . Qi-Qi 



i^A") 



Q3-2Q2 log(fl) (23-222)2 22 

1 1 



l0g(fl)23-22' 



, (\ I f ^JM_ 1 (22)' . 23-22 
Hft -sia) = he nla) — ^ r^^ ^ — r log ^ 

P3W «.3V^g^_2e2 log(«) (23-222)2 ^ 22 

asVar This follows from integrating out the IF. 

asBias Boundedness of the IF is obvious from the terms just derived, so asBias is finite. 
FSBP Terms for e* are simple generalizations of R.& H. [47, Prop. 5.1], e* follows from usual 
LLN arguments. D 

Proof of Proposition A.5 (MMed): A general reference is Peng and Welsh [38]. 

IF The IF of MMed is a linear combination of the IF of the sample median already used for 
the PE, and the IF of the median of the ^ -coordinate of Aq^ ;2 (^ ) ■ The assertion on the level 
sets of form [qi , ^2] follows from Peng and Welsh [38] or by plotting the respective IF for 
actual ^-values. More precisely, for £, = 0.7 we obtain qi = 0.3457 and ^2 == 2.5449. 
(All) is a simple generalization of the IF to a general quantile and (A12) is entailed by 
the A-method. As D does not depend on x, we may incorporate the standardizing term 
involving evaluations of fg into D and to obtain the IF as IF(x;MMed,Fg) = Dxe with 
Xefrom(A12). 

asVar This follows from integrating out the IF. 

asBias The IF of MMed is clearly bounded, so asBias is finite. D 

Proof of Proposition A.6 (MedkMAD): 

IF By the implicit function theorem, the Jacobian in the Delta method 
is D from (AI5). Hence by the A-method, IF (x; MedkMAD, Fg) == 
D(IF(A;kMAD,Fe),IF(x;mcdian,Fe)/)^ where the IF of kMAD is a simple gen- 
eralization of the one for MAD, to be drawn e.g. from Rieder [42, Ch. 1.5]. For the 
entries of D we note 






logMir ^ ^-if(^'-i) 



V=V- 

50(2) _ 13 /"of ,„„/oA 2«-l\ acP) _ 2«-l 



V=V- 



= ff2«log(2)-2i_i 



dM P ' dm p 



3gP) _ f) 3gP) 
dM ^' dm 



for 



:={l+^m+J2i\ 5^ ^,^fl+^,n^\ 7 



asVar With obvious generalizations, Oj j, i,j = 1,2, may be taken from Serfling and Mazumder 

[49]. 
asBias Both IFs of median and kMAD are bounded, so the asymptotic bias of MedkMAD is 

finite. 
FSBP The assertions are shown in R.& H. [47, Prop. 5.2]. D 
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